摘要 :
Associative classification is a supervised classification method. Many experimental studies have shown that associative classification is a promising approach. However, the latter suffer from a major drawback: the huge number of t...
展开
Associative classification is a supervised classification method. Many experimental studies have shown that associative classification is a promising approach. However, the latter suffer from a major drawback: the huge number of the generated classification rules which takes efforts to select the best ones in order to construct the classifier. To overcome such drawback, we propose in this paper a new direct associative classification method called IGARC, an improvement of GARC approach that extracts directly generic associative classification rules from a training set in order to reduce the number of associative classification rules without jeopardising the classification accuracy. Moreover, we propose an algorithm called PN-GARC that deals with negative classification rules. Considering negated items in classification framework provides additional information describing the data and reduces the conflicts while classifying new objects. Nevertheless, there are a sheer number of rules when considering negated items. That is why, we will explore generic classification rules both negative and positive ones in order to study their behaviour and their usefulness on the studied datasets. A detailed description of IGARC method is presented, as well as the experimentation study on 12 benchmark datasets proving that it is highly competitive in terms of accuracy in comparison with popular classification approaches.
收起
摘要 :
Classification based on association rule mining, also known as associative classification, is a promising approach in data mining that builds accurate classifiers. In this paper, a rule ranking process within the associative class...
展开
Classification based on association rule mining, also known as associative classification, is a promising approach in data mining that builds accurate classifiers. In this paper, a rule ranking process within the associative classification approach is investigated. Specifically, two common rule ranking methods in associative classification are compared with reference to their impact on accuracy. We also propose a new rule ranking procedure that adds more tie breaking conditions to the existing methods in order to reduce rule random selection. In particular, our method looks at the class distribution frequency associated with the tied rules and favours those that are associated with the majority class. We compare the impact of the proposed rule ranking method and two other methods presented in associative classification against 14 highly dense classification data sets. Our results indicate the effectiveness of the proposed rule ranking method on the quality of the resulting classifiers for the majority of the benchmark problems, which we consider. This provides evidence that adding more appropriate constraints to break ties between rules positively affects the predictive power of the resulting associative classifiers.
收起
摘要 :
The paper presents the results of research related to the efficiency of the so-called rule quality measures which are used to evaluate the quality of rules at each stage of the rule induction. The stages of rule growing and prunin...
展开
The paper presents the results of research related to the efficiency of the so-called rule quality measures which are used to evaluate the quality of rules at each stage of the rule induction. The stages of rule growing and pruning were considered along with the issue of conflict resolution which may occur during the classification. The work is the continuation of research on the efficiency of quality measures employed in sequential covering rule induction algorithm. In this paper we analyse only these quality measures (8 measures) which had been recognized as effective based on previous conducted research. The study was conducted on approximately 70 benchmark datasets related to classification, regression and survival analysis problems. In the comparisons we analyzed prognostic abilities of the induced rules as well as the complexity of the resulting rule-based data models.
收起
摘要 :
Interesting classification rules can be determined by a number of measures. When searching a domain for a characterisation of unique, different, but important data an appropriate measurement is diversity. Diversity as a measure of...
展开
Interesting classification rules can be determined by a number of measures. When searching a domain for a characterisation of unique, different, but important data an appropriate measurement is diversity. Diversity as a measure of a classification rule is based on the relative distinctness of the rule to the other rules in the rule-set. The diversity measure is the sum of the inverse of commonness of a rule's items. In this paper, diversity is derived from the simplest classification trees using techniques from statistics and information retrieval, and demonstrated using sample datasets.
收起
摘要 :
Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a training set of previously classified data. The rules thus generated will be i...
展开
Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a training set of previously classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm (typically support and confidence threshold values). In this paper we examine the effect that this choice has on the predictive accuracy of CARM methods. We show that the accuracy can almost always be improved by a suitable choice of parameters, and describe a hill-climbing method for finding the best parameter settings. We also demonstrate that the proposed hill-climbing method is most effective when coupled with a fast CARM algorithm such as the TFPC algorithm which is also described.
收起
摘要 :
Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classif...
展开
Traditional classification techniques such as decision trees and RIPPER use heuristic search methods to find a small subset of patterns. In recent years, a promising new approach that mainly uses association rule mining in classification called associative classification has been proposed. Most associative classification algorithms adopt the exhaustive search method presented in the famous Apriori algorithm to discover the rules and require multiple passes over the database. Furthermore, they find frequent items in one phase and generate the rules in a separate phase consuming more resources such as storage and processing time. In this paper, a new associative classification method called Multi-class Classification based on Association Rules (MCAR) is presented. MCAR takes advantage of vertical format representation and uses an efficient technique for discovering frequent items based on recursively intersecting the frequent items of size n to find potential frequent items of size n +1. Moreover, since rule ranking plays an important role in classification and the majority of the current associative classifiers like CBA and CMAR select rules mainly in terms of their confidence levels. MCAR aims to improve upon CBA and CMAR approaches by adding a more tie breaking constraints in order to limit random selection. Finally we show that shuffling the training data objects before mining can impact substantially the prediction power of some well known associative classification techniques. After experimentation with 20 different data sets, the results indicate that the proposed algorithm is highly competitive in term of an error rate and efficiency if compared with decision trees, rule induction methods and other popular associative classification methods. Finally, we show the effectiveness of MCAR rule sorting method on the quality of the produced classifiers for 12 highly dense benchmark problems.
收起
摘要 :
This paper introduces a novel proposal to discover the best associative classification rules through studying the influence of the attributes used in robust catalogues. Notice that a catalogue is defined as a dataset free of dupli...
展开
This paper introduces a novel proposal to discover the best associative classification rules through studying the influence of the attributes used in robust catalogues. Notice that a catalogue is defined as a dataset free of duplicate records. Moreover, a robust catalogue is obtained when incomplete records and those with uncertainty are eliminated from a catalogue. Therefore, a robust catalogue is a collection of association rules with 100% confidence and unitary support. In this paper we demonstrate that robust catalogues contain the same association rules as the datasets from which they were obtained, but can be managed in memory without eliminating any data from the analysis. In fact, the experiments performed show that all robust catalogues contained in a classification dataset are easily obtained, providing millions of associative classification rules with 100% confidence to the expert researcher in data mining. (c) 2017 Elsevier Ltd. All rights reserved.
收起
摘要 :
Packet classification is the foundation of many Internet functions such as QoS and security. A long thread of research has proposed efficient software-based solutions to this problem. Such software solutions are attractive because...
展开
Packet classification is the foundation of many Internet functions such as QoS and security. A long thread of research has proposed efficient software-based solutions to this problem. Such software solutions are attractive because they require cheap memory systems for implementation, thus bringing down the overall cost of the system. In contrast, hardware-based solutions use more expensive memory systems, e.g., TCAMs, but are often preferred by router vendors for their faster classification speeds. The goal of this paper is to find a 'best-of-bofh-worlds' solution — a solution that incurs the cost of a software-based system and has the speed of a hardware-based one. Our proposed solution, called smart rule cache achieves this goal by using minimal hardware — a few additional registers — to cache evolving rules which preserve classification semantics, and additional logic to match incoming packets to these rules. Using real traffic traces and real rule sets from a tier-1 ISP, we show such a setup is sufficient to achieve very high hit ratios for fast classification in hardware. Cache miss ratios are 2 ~ 4 orders of magnitude lower than flow cache schemes. Given its low cost and good performance, we believe our solution may create significant impact on current industry practice.
收起
摘要 :
Associative classification (AC) is a promising data mining approach that integrates classification and association rule discovery to build classification models (classifiers). In the last decade, several AC algorithms have been pr...
展开
Associative classification (AC) is a promising data mining approach that integrates classification and association rule discovery to build classification models (classifiers). In the last decade, several AC algorithms have been proposed such as Classification based Association (CBA), Classification based on Predicted Association Rule (CPAR), Multi-class Classification using Association Rule (MCAR), Live and Let Live (L~3) and others. These algorithms use different procedures for rule learn- ing, rule sorting, rule pruning, classifier building and class allo- cation for test cases. This paper sheds the light and critically compares common AC algorithms with reference to the above- mentioned procedures. Moreover, data representation formats in AC mining are discussed along with potential new research directions.
收起
摘要 :
A new approach for analyzing the “molecule–descriptor” matrix for the QSAR problem (Quan- titative Structure–Activity Relationship) based on a fuzzy cluster structure of the learning sample is pre- sented. The ways for generat...
展开
A new approach for analyzing the “molecule–descriptor” matrix for the QSAR problem (Quan- titative Structure–Activity Relationship) based on a fuzzy cluster structure of the learning sample is pre- sented. The ways for generating fast rules for refusing prediction and searching the spikes in the learning sam- ple are described. For this purpose, a special space of descriptors, simple for calculation, is introduced. The ways for optimizing the discriminant function according to fuzzy clustering parameters are examined. Highly predictive models based on the presented approach have been generated. The models are compared, and the efficiency of the described methods is revealed.
收起